Introduction

In this project, we will use a dataset from the 2012 national French census. The dataset give us information of the “Communes” and region with a population size exceeding 2000 individuals.

Our objective is to conduct an analysis of this data for the region of Franche-Comté and Bourgogne. In the following map, we can have a visualization of the departments of Bourgogne and Franche-Compté. We will carry out an analysis of the housing status of the residents with their age group, gender, level of education, employment status and the housing type they live.

Data Variable and Category Details

The demographic study of Bourgogne and Franche-Comté will involve an analysis of its individual “communes,” with two separate analyses to be conducted. All data will be presented as percentages.

  • Person belonging to an age class:

    • 15-29 years old (age_15_29)
    • 30-74 years old (age_30_74_pct)
    • 75 years old or more (age_75p)
  • Gender of the residents:

    • Men (pop_hommes_pct)
    • Women (femmes)
  • The level of Education:

    • No Diploma (dipl_aucun)
    • Has at least one diploma (has_dip)
  • Employment Status:

    • Population having an occupation (agric, indep, cadres, interm, ouvr)
    • Population having no occupation (chom)
  • Housing type of residents:

  • Living in secondary housing (resid_sec)

  • Living in HLM (hlm)

  • Living in a home (maison)

  • Living in a appartment (appart)

  • Housing status of residents:

    • Owner of a property (proprio)
    • Renting a property (locataire)

Variables names index in R code

  • Data_Ques_df : All the data of the data set
  • vars: Names of the column variables
  • bg_fc: Region of Bourgogne and Franche-Comté only
  • unemp_rate: Unemployment rate
  • data_Women_Men: % of Men and Women per region

Using the dataset, we will look to do an analysis of the relationship of the mentioned variables and have insight of the economical status of the people of Bourgogne and Franche comté as follows:

  • What is the relationship between the housing type of the residents and their education and employment status?

  • Does the gender of a person is correlated to their type of occupation and education level for each region?

  • Does living in a property type for a person is influenced by their gender, occupation status or education?

As a first step, we will create the 3 new variables that is missing for our analysis and use it along with the chosen variables as follows:

  • has_dip = pop_dipl_bepc + pop_dipl_capbep + pop_dipl_bac + pop_dipl_bac2 + pop_dipl_sup
  • pop_30_74 = pop_tot - pop_0_14 - pop_15_29 - pop_75p
  • pop_hommes=pop_tot - pop_femmes

Description of the Variables in the data set

Column_Name Description
INSEE_COM INSEE code
commune Commune
code_region Postal code of the region.
region Name of the region
code_departement Code of the department
departement Name of the department
pop_tot Total Population
pop_cl Population Class
pop_0_14 Population aged 0-14 years
pop_15_29 Population aged 15-29 years
pop_18_24 Population between 18-24 years
pop_75p Population aged 75 years and over
pop_femmes Population of Women
pop_act_15p Population having a main property
pop_chom Population with unemployed individuals
pop_agric Population working in agriculture
pop_indep Population of Independent employment
pop_cadres Population working as managers or executives
pop_interm Population working in intermediate professions
pop_empl Population Employed (salariés)
pop_ouvr Manual workers
pop_scol_18_24 Population aged between 18-24 pursuing an education
pop_non_scol_15p Population not in school aged 15
pop_dipl_aucun Population having no diploma
pop_dipl_bepc Population having a BEPC diploma (a French diploma obtained after completing lower secondary education)
pop_dipl_capbep Population having a CAP or BEP diploma (a French vocational diploma obtained after completing secondary
education)
pop_dipl_bac Population having a Baccalaureate diploma (a French diploma obtained after completing upper secondary education)
pop_dipl_bac2 Population having a 2-year post-Baccalaureate diploma (a French diploma obtained after completing 2 years of higher education after the Baccalaureate).
pop_dipl_sup Population with a higher education diploma (more than 2 years of higher education)
log_rp Population having a main residential property
log_proprio Population owning their own home
log_loc Population renting their home
log_hlm Population living in HLM (French social housing program)
log_sec Population living in social housing
log_maison Population living in a house
log_appart Population living in an apartment
age_0_14 Percentage of population aged 0-14 years
age_15_29 Percentage of population aged 15-29 years
age_75p Percentage of population aged 75 years and over
femmes Percentage of female population
chom Percentage of unemployed population
agric Percentage of population working in agriculture
indep Percentage of population working as independent workers
cadres Percentage of population working as managers or executives
interm Percentage of population working in intermediate
professions
empl Percentage of population working as employees (salariés)
ouvr Percentage of population working as manual workers
etud Percentage of population studying
dipl_aucun Percentage of population with no diploma
dipl_bepc Percentage of population with a BEPC diploma (a French
diploma obtained after completing lower secondary education)
dipl_capbep Percentage of population with a CAP or BEP diploma (a French vocational diploma obtained after
completing secondary education)
dipl_bac Percentage of population with a Baccalaureate diploma (a French diploma obtained after completing upper secondary education)
dipl_bac2 Percentage of population with a 2-year post-Baccalaureate diploma (a French diploma obtained after completing 2 years of higher education after the Baccalaureate)
dipl_sup Percentage of population with a higher education diploma (more than 2 years of higher education)
resid_sec Percentage of population living in social housing
proprio Percentage of population owning a home
locataire Percentage of population renting a home
hlm Percentage of population living in HLM (French social housing program)
maison Percentage of population living in a house
appart Percentage of population living in an apartment

Descriptive Statistics

We will proceed to extract some descriptive statistics from our data for the regions of Bourgogne and Franche Comté.

Population concentration of Bourgogne and Franche-Comte

In the following map, we can have an overview of the concentration of the population.

From the bar plot above, we can see a higher number of residents census data from Dijon, Chalon-Sur-Saône and Nevers from the Bourgogne region. From the Franche-Comté region, we have a high number of residents data from Besançon and Belfort.

Age group by gender of the population

 age_sex_long <- age_sex %>% 
  gather(key = "age_group", value = "population", pop_0_14, pop_15_29, pop_30_74, pop_75p) %>% 
  gather(key = "gender", value = "count", Hommes, Femmes)

ggplot(age_sex_long, aes(x = region, y = count, fill = age_group)) +
  geom_bar(stat = "identity", position = "stack") +
  facet_wrap(~ gender, ncol = 2, scales = "free_y") +
  labs(title = "Population by Age Group, Region, and Gender", x = "Region", y = "Population") +
  scale_fill_manual(values = c("#E69F00", "#56B4E9", "#009E73", "#F0E442")) +
  theme_minimal()

From then bar chart above, we can see a larger number of mens than women in both regions. We can also notice of the proportion of age groups of different gender is almost distributed with the same proportions with their respective totals in both Bourgogne and Franche Comté.

Unemployment rate by region and department

From both the bar charts, we see the level of unemployment is higher in Franche comté than in Bourgogne. When we observe the unemployment per departments for those two regions, Territoire de Belfort, Yonne, Doubs and Haute-Saône has the highest level of unemployment.

Percentage of Employment per department

People having no diploma per Region and Department

G_age <- ggplot(data = bg_fc, mapping = aes(x = departement, y = dipl_aucun, col = region)) + 
  geom_boxplot() + facet_grid() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

G_age

We can see higher percentages of individuals without a diploma qualification in haut Saône and Yonne.

Housing Ownership by regions

region perc_propio perc_locataire
Bourgogne 23.99 22.73
Franche-Comte 22.15 23.92

We can see a higher percentage of people that renting a property to live than owning one in the Franche comté region. This is the opposite for the Bourgogne region that have a relatively higher percentage of people owning their home .

Type of Housing by regions

region perc_sec perc_maison perc_appart
Bourgogne 1.38 21.45 25.86
Franche-Comte 1.20 17.46 29.15

We can see a higher percentage of individuals living in an appartment in both the region. However, there is only a small percentage of people living in social housing in both region.

General Analysis of the data (Correlation)

We got some insights of the data of both the Bourgogne and Franche-Comté. We have seen some differences between the regions and the variables do not show the same insight. Now, we are going to study the dataset of the 2 regions (Bourgogne and Franche-Comté) to see if there exists relationships between the variables.

Bourgogne and Franche-Comte correlation plot

The correlation plot shows that darker colors indicate higher correlation, and the orientation of the ellipse indicates the sign of the correlation. We can observe the following relationship between our variables:

  • People aged 15-29 is positively correlated with having a diploma, renting a house, living in HLM and apartments, and negatively correlated with owning a house. Age 30-74 has the opposite behavior. And age 75 and plus, positively correlated with women and negatively with men.

  • opposite behaviours of the genders can be observed.We can see positive correlation of renting, living in HLM and appartments while owning a property and a house have a negative correlation. Conversely, the men category shows the opposite behavior.

  • The unemployment category (chom) is positively correlated with manual workers, has diploma, renting, living in HLM and apartments, and negatively correlated with owning a house and living in a house.

  • The cadre category is positively correlated with intermediate jobs and negatively correlated with manual workers and no diploma category.

  • The education category (no diploma and diploma) is negatively correlated with owning a house and positively correlated with renting, living in HLM and apartments.

  • The housing category shows that owning a house is positively correlated with owning a property and negatively correlated with living in HLM and apartments. Renting and living in HLM show the opposite behavior.

We will now have a different correlation plot for each region.

Bourgogne Correlation plot

We can observe similar correlations for the Bourgogne region compared with the correlation of both the regions.

Franche-Compte Correlation plot

The correlation plot for the Franche-Comte region reveals the following differences:

  • The age category of 75 years and above shows a strong positive correlation with women and a negative correlation with men.

  • The gender category shows a similar trend but with weaker correlation than the last correlation plots.

  • The housing type behavior in the gender category does not exhibit a significant correlation.

  • The type of employment category exhibits a similar trend, with stronger correlation as indicated by darker colors, particularly with respect to the type of housing.

Analysis

We have seen that many variables of the data are correlated. We will use the Principal Component Analysis reduction method lower-dimensional representation while preserving the majority of the original data’s variability of the dataset. We will have a PCA analysis for Bourgogne and Franche comté using variables of the category population age group, gender, education level, employment status, housing type and housing status.

Bourgogne (BG) and Franche-Comté Region (FC)

We do a PCA on a on a set of 18 variables extracted from the census data rp2012. These variables capture different aspects, such as gender (male or female), education (has diploma or no diploma) , employment (unemployed, agriculture, independent, managers, interm and ouvr workers), housing (owner, rent, HLM, house, apartment, resid_sec), and age groups (15-29, 30-74 and >75). Below is the analysis of both the region together.

Variance of the variables

In order to determine the most suitable PCA analysis to conduct in this case, we examine the variance of our variables.

age_15_29 age_30_74_pct age_75p femmes pop_hommes_pct chom agric indep cadres interm ouvr dipl_aucun has_dip resid_sec proprio locataire hlm maison appart
age_15_29 10.10 -7.15 -3.47 -0.70 0.70 3.66 -0.52 -1.58 -0.12 -1.70 2.87 2.77 13004.37 -0.75 -23.82 23.48 10.62 -41.56 41.25
age_30_74_pct -7.15 10.46 -2.66 -1.87 1.87 -7.02 0.17 0.59 3.23 6.00 -6.98 -7.63 -8647.80 -0.83 29.60 -28.75 -13.93 41.20 -40.41
age_75p -3.47 -2.66 12.29 4.23 -4.23 4.34 0.44 1.62 -2.38 -4.89 0.96 5.29 -783.95 2.11 -13.28 12.41 4.72 -9.28 8.32
femmes -0.70 -1.87 4.23 3.50 -3.50 2.88 -0.09 0.33 0.44 -0.49 -1.72 2.26 2482.91 0.29 -10.13 9.81 6.53 -12.26 11.89
pop_hommes_pct 0.70 1.87 -4.23 -3.50 3.50 -2.88 0.09 -0.33 -0.44 0.49 1.72 -2.26 -2482.91 -0.29 10.13 -9.81 -6.53 12.26 -11.89
chom 3.66 -7.02 4.34 2.88 -2.88 21.37 -1.33 -2.38 -8.46 -11.39 18.34 21.44 6695.98 -1.10 -43.58 43.18 35.59 -55.07 54.25
agric -0.52 0.17 0.44 -0.09 0.09 -1.33 1.44 0.37 -1.01 -1.13 1.15 -0.62 -1291.93 1.13 2.47 -2.69 -2.41 6.13 -6.17
indep -1.58 0.59 1.62 0.33 -0.33 -2.38 0.37 2.97 -0.21 -0.36 -3.90 -2.26 -2181.17 2.19 4.33 -4.66 -6.70 10.27 -10.41
cadres -0.12 3.23 -2.38 0.44 -0.44 -8.46 -1.01 -0.21 22.66 14.57 -29.40 -16.63 8744.30 -3.58 17.64 -17.29 -13.80 4.64 -3.91
interm -1.70 6.00 -4.89 -0.49 0.49 -11.39 -1.13 -0.36 14.57 23.53 -29.94 -18.77 3119.05 -2.78 25.48 -24.82 -13.60 25.68 -25.12
ouvr 2.87 -6.98 0.96 -1.72 1.72 18.34 1.15 -3.90 -29.40 -29.94 75.30 34.68 -10515.02 4.40 -34.94 34.99 26.81 -43.28 42.88
dipl_aucun 2.77 -7.63 5.29 2.26 -2.26 21.44 -0.62 -2.26 -16.63 -18.77 34.68 37.21 -62.86 -0.51 -43.34 42.74 36.21 -51.99 50.97
has_dip 13004.37 -8647.80 -783.95 2482.91 -2482.91 6695.98 -1291.93 -2181.17 8744.30 3119.05 -10515.02 -62.86 59208832.20 -2014.91 -37456.81 37064.48 15990.84 -73541.41 73149.91
resid_sec -0.75 -0.83 2.11 0.29 -0.29 -1.10 1.13 2.19 -3.58 -2.78 4.40 -0.51 -2014.91 18.81 -3.08 1.87 -3.49 -1.52 1.05
proprio -23.82 29.60 -13.28 -10.13 10.13 -43.58 2.47 4.33 17.64 25.48 -34.94 -43.34 -37456.81 -3.08 175.31 -171.74 -108.09 234.69 -230.77
locataire 23.48 -28.75 12.41 9.81 -9.81 43.18 -2.69 -4.66 -17.29 -24.82 34.99 42.74 37064.48 1.87 -171.74 169.05 107.43 -230.77 226.99
hlm 10.62 -13.93 4.72 6.53 -6.53 35.59 -2.41 -6.70 -13.80 -13.60 26.81 36.21 15990.84 -3.49 -108.09 107.43 104.92 -141.72 140.02
maison -41.56 41.20 -9.28 -12.26 12.26 -55.07 6.13 10.27 4.64 25.68 -43.28 -51.99 -73541.41 -1.52 234.69 -230.77 -141.72 401.52 -397.92
appart 41.25 -40.41 8.32 11.89 -11.89 54.25 -6.17 -10.41 -3.91 -25.12 42.88 50.97 73149.91 1.05 -230.77 226.99 140.02 -397.92 395.38

It appears that the variances of the variables exhibit notable differences in importance (for example in the variable has_dip). In order to address this problem and ensure accurate analysis, we will implement standardized PCA to scale the variances appropriately.

PCA

kable(PCA_sd$eig[1:4,], digits = 3, format = "markdown")
eigenvalue percentage of variance cumulative percentage of variance
comp 1 7.367 38.771 38.771
comp 2 2.895 15.239 54.011
comp 3 2.657 13.986 67.997
comp 4 1.500 7.894 75.891
fviz_eig(PCA_sd)

The PCA reduced the dimension of the dataset and we will keep 3 axes which explains explain 75.89% of the variance of the data. We will carry out our analysis with 3 axes.

Dimensions

kable(PCA_sd$var$coord[,1:3], digits = 3, format = "markdown")
Dim.1 Dim.2 Dim.3
age_15_29 0.555 -0.415 -0.501
age_30_74_pct -0.739 0.097 0.019
age_75p 0.303 0.240 0.772
femmes 0.447 -0.125 0.825
pop_hommes_pct -0.447 0.125 -0.825
chom 0.818 0.201 -0.011
agric -0.187 0.379 0.119
indep -0.291 0.180 0.471
cadres -0.348 -0.778 0.157
interm -0.507 -0.682 0.080
ouvr 0.448 0.695 -0.343
dipl_aucun 0.687 0.498 -0.085
has_dip 0.383 -0.576 -0.067
resid_sec 0.016 0.254 0.172
proprio -0.950 0.096 -0.025
locataire 0.951 -0.101 0.010
hlm 0.815 -0.021 -0.045
maison -0.887 0.308 0.095
appart 0.881 -0.313 -0.105

Component Summaries

PCA- Dim.1

The age group 30-74, unemployed individuals (chom), owning a house (propio), rent (locataire), living in a house (maison) or apartment (appart), and hlm are the categories that are better represented by the first principal components. Additionally, the category with no diploma (dipl_aucun) is also well represented.

PCA - Dim.2

The cadres are the best represented in this dimesion. Followed by interm, manual workers (ouvr).

PCA - Dim.3

The age group \(>75\), women, men are the better represented in this dimension

Contribution

kable(PCA_sd$var$contrib[,1:3], digits = 3, format = "markdown")
Dim.1 Dim.2 Dim.3
age_15_29 4.181 5.943 9.450
age_30_74_pct 7.409 0.325 0.014
age_75p 1.246 1.984 22.412
femmes 2.710 0.542 25.616
pop_hommes_pct 2.710 0.542 25.616
chom 9.077 1.395 0.005
agric 0.473 4.965 0.529
indep 1.152 1.117 8.343
cadres 1.648 20.898 0.924
interm 3.490 16.051 0.241
ouvr 2.730 16.677 4.434
dipl_aucun 6.406 8.549 0.272
has_dip 1.992 11.444 0.169
resid_sec 0.004 2.226 1.119
proprio 12.239 0.319 0.023
locataire 12.287 0.356 0.004
hlm 9.016 0.015 0.075
maison 10.684 3.269 0.343
appart 10.547 3.383 0.412
fviz_pca_contrib(PCA_sd, choice = "var", axes = 1)
## Warning in fviz_pca_contrib(PCA_sd, choice = "var", axes = 1): The function
## fviz_pca_contrib() is deprecated. Please use the function fviz_contrib() which
## can handle outputs of PCA, CA and MCA functions.

According to the contribution plot, the individuals who are tenants (locataire), property owners (propio), living in a house or apartment, living in subsidized housing (HLM), and unemployed have made the greatest contribution to the data.

Dimension 1 and 2

Dim.1 Dim.2 Dim.3
age_15_29 0.308 0.172 0.251
age_30_74_pct 0.546 0.009 0.000
age_75p 0.092 0.057 0.596
femmes 0.200 0.016 0.681
pop_hommes_pct 0.200 0.016 0.681
chom 0.669 0.040 0.000
agric 0.035 0.144 0.014
indep 0.085 0.032 0.222
cadres 0.121 0.605 0.025
interm 0.257 0.465 0.006
ouvr 0.201 0.483 0.118
dipl_aucun 0.472 0.248 0.007
has_dip 0.147 0.331 0.004
resid_sec 0.000 0.064 0.030
proprio 0.902 0.009 0.001
locataire 0.905 0.010 0.000
hlm 0.664 0.000 0.002
maison 0.787 0.095 0.009
appart 0.777 0.098 0.011

The age groups of 30-74 and 15-29 are better represented in dimensions 1 and 2. Similarly, the categories of education attainment (diploma and no diploma), job types (cadres, intermediate, manual workers, and unemployed), and housing types (living in a house, owning a house, tenants, apartment, and HLM) are also well-represented in these dimensions.

PCA Correlation Circle

From our previous component summary for dimension 1 and 2, we have that The age group 30-74, unemployed individuals (chom), owning a house (propio), tenants (locataire), living in a house (maison) or apartment (appart), and hlm are the categories that are better represented by the first principal components. Additionally, the category with no diploma (dipl_aucun) is also well represented.

  • We observe the following variables to interpret the first axe:

  • The level of Education:

    • No Diploma
  • Employment Status:

    • Manual Workers
    • Employment status unknown
  • Population age group:

    • 15-29 years old
  • Housing Status of residents:

    • Living in HLM
    • Living in a appartment
    • Renting (locataire)

On the opposite direction:

  • Housing Status of residents:
    • Living in a house
    • Owning a property.
  • Population age group:
    • 30-74 years old

*The second axe:

  • The level of Education:
    • No Diploma
    • Has diploma (has_dipl)
  • Employment Status:
    • manual workers (ouvr)
    • Interm
    • Managers (Cadres)

Intepretation of the first axe:

  • We can see that a manual worker (ouvr) has a close correlation of not having a diploma.

  • We also see that people who lives in hlm residences have a correlation with being unemployed and inversely correlated of having a house and being the owner.

  • People aged 30 to 74 years old are correlated of having a house and being the owner.

  • A person who rents a property is correlated with living in an appartment and inversely correlated of owing in a house as owner.

Intepretation of the second axe:

  • People aged 15_29 is correlated of having at least one diploma and inversely correlated with older people of working as interim and be an executive.

  • A person is inversely correlated of working as an executive (cadre) with a manual worker (ouvr).

125 (Branges - BG) and 142 (Gergy - BG) shows the communes having people correlated with owning a home between 30 and 74 years old. The communes 138 (Le Creusot - BG) and 179 (Sens - BG) show an inverse relationship of renting an appartment for younger people aged 15 and 29 years old.

Dimension 1 and 3

We can see that individuals aged 15-29, 30-74, and >75 are well-represented in terms of age groups. Those without a diploma are more strongly represented in education. Housing types such as maison, propio, locataire appart, and hlm are well-represented, as are the unemployed. Additionally, both women and men are better represented in this dimension.

  • We get the additional information of older womens aged 75 years old or more and inversely correlated with men on the second axe.

We can see the communes of code 191 (Évette-Salbert - FC), 27 (Varois-et-Chaignot - BG), 98 (
Marzy -BG) and 101 (Saint-Éloi -BG) having a relationship with people owning a home aged between 30 and 74 years old. 85 (Salins-les-Bains BG) and 88 (Château-Chinon (Ville) -FC). 52 (Montbéliard - FC) and 63 (Seloncourt - FC) have people renting a home and unemployed correlated with those communes.

Dimension 2 and 3

We see a more accurate representation of both women and men across a range of categories in this dimension, including those aged 15-29 and >75, as well as cadres, intermediate workers, those with diplomas, and manual workers.

## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

We can see 66 (Valdahon - FC), 118 (Saint-Sauveur - FC) and 161 (Varennes-le-Grand - BG) having more men in those communes. 9 (Fontaine-lès-Dijon - BG) and 20 (Saint-Apollinaire - BG) show more people that are employed as executives.

Clusters

According to the gap statistics method, it is recommended that we choose two clusters for the combined dataset of the Bourgogne and Franche-Compte regions. In this case by the K-means method, the size of the clusters are 102 and 93.

kmeans_clusters1 <- kmeans(PCA_sd$ind$coord[,1:3], centers = 2)
#kable(kmeans_clusters1$cluster, format = "markdown")
cat("The size of the clusers 1 and 2 are", "\n", kmeans_clusters1$size)
## The size of the clusers 1 and 2 are 
##  93 102

Cluster plot

## Warning: ggrepel: 60 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

3D Plot cluster by the K-means

Dendogram

We use the hclust function with the ward.D2 method to generate the dendrogram, selecting 3 as the number of clusters.

## 
##  The communes of the First cluster are:  
##  Arc-sur-Tille Beaune Châtillon-sur-Seine Chenôve Chevigny-Saint-Sauveur Dijon Fontaine-lès-Dijon Gevrey-Chambertin Is-sur-Tille Longvic Marsannay-la-Côte Mirebeau-sur-Bèze Montbard Nuits-Saint-Georges Plombières-lès-Dijon Quetigny Saint-Apollinaire Saulieu Semur-en-Auxois Sennecey-lès-Dijon Seurre Talant Varois-et-Chaignot Venarey-les-Laumes Avanne-Aveney Baume-les-Dames Bavans Besançon Doubs École-Valentin Étupes Exincourt Franois L'Isle-sur-le-Doubs Mathay Miserey-Salines Montbéliard Montferrand-le-Château Morteau Ornans Pirey Pontarlier Pont-de-Roide Roche-lez-Beaupré Saint-Vit Saône Seloncourt Thise Vieux-Charmont Voujeaucourt Arbois Champagnole Damparis Dole Foucherans Lons-le-Saunier Montmorot Morbier Poligny Saint-Amour Saint-Claude Salins-les-Bains Tavaux La Charité-sur-Loire Château-Chinon (Ville) Clamecy Cosne-Cours-sur-Loire Coulanges-lès-Nevers Decize Fourchambault Guérigny Marzy Nevers Pougues-les-Eaux Saint-Éloi Saint-Pierre-le-Moûtier Varennes-Vauzelles Arc-lès-Gray Échenoz-la-Méline Gray Lure Luxeuil-les-Bains Rioz Vaivre-et-Montoille Vesoul Autun Bourbon-Lancy Le Breuil Buxy Chagny Chalon-sur-Saône Charnay-lès-Mâcon Charolles Châtenoy-le-Royal Chauffailles Cluny Le Creusot Crissey Digoin Épinac Givry Gueugnon Louhans Mâcon Montceau-les-Mines Montcenis Montchanin Paray-le-Monial Saint-Marcel Saint-Rémy Saint-Vallier Sanvignes-les-Mines Sennecey-le-Grand Tournus Appoigny Auxerre Avallon Chevannes Joigny Monéteau Paron Pont-sur-Yonne Saint-Clément Saint-Georges-sur-Baulche Saint-Julien-du-Sault Sens Tonnerre Toucy Villeneuve-sur-Yonne Bavilliers Belfort Châtenois-les-Forges Danjoutin Essert Évette-Salbert Giromagny Offemont Valdoie
## 
##  The communes of the Second cluster are:  
##  Auxonne Brazey-en-Plaine Genlis Selongey Audincourt Bethoncourt Charquemont Fesches-le-Châtel Les Fins Grand-Charmont Hérimoncourt Villers-le-Lac Levier Maîche Mandeure Le Russey Sochaux Valdahon Valentigney Moirans-en-Montagne Morez Les Rousses Saint-Lupicin Garchizy Imphy La Machine Saint-Léger-des-Vignes Champagney Fougerolles Héricourt Noidans-lès-Vesoul Port-sur-Saône Ronchamp Saint-Loup-sur-Semouse Saint-Sauveur Blanzy Branges Champforgeuil La Chapelle-de-Guinchay Ciry-le-Noble Crêches-sur-Saône Gergy Ouroux-sur-Saône Saint-Germain-du-Plain Sornay Torcy Varennes-le-Grand Brienon-sur-Armançon Chablis Champigny Cheny Migennes Saint-Florentin Villeneuve-la-Guyard Beaucourt Delle Grandvillars

Map of the clusters

Cluster Analysis:

  • We conducted a cluster analysis using both 2 and 3 clusters, as determined by the gap statistics which measures the mean of the communes within clusters. However, when we used 3 clusters, we found significant overlap between the clusters, which led us to ultimately choose to continue our analysis with 2 clusters.

  • Once we have applied two clustering techniques, k-means and dendrogram, we can compare the results of both methods to gain a better understanding of the data structure and identify any potential patterns or relationships between the variables.

  • Although the k-means clustering method resulted in many intersections, making it difficult to interpret, the dendrogram method provided a better separation of the communes, leading to a clearer splitting of the communes for further interpretation. Therefore, we may find that the dendrogram method is more useful for analyzing this particular dataset.

  • After obtaining the two clusters using the dendrogram method, we can add them to the map of the Bourgogne and Franche-Comte regions to visually compare the two clusters in terms of demographic variables. This could help us to identify any potential relationships between the clusters and demographic factors

We will now perform a PCA for each region to compare how our findings of the analysis of both regions differs if they are analysed separately.

PCA of Bourgogne

PCA

kable(PCA_sd_bg$eig[1:3,], digits = 3, format = "markdown")
eigenvalue percentage of variance cumulative percentage of variance
comp 1 7.820 41.159 41.159
comp 2 3.592 18.905 60.063
comp 3 2.267 11.931 71.994
fviz_eig(PCA_sd_bg)

We will keep 3 axes also that explains 72% of the variance for this region for our analysis.

We can see from the contribution plot that the locataire, proprio, people living in a house and appartment or hlm, unemployed has the most contribution to the data.

Dimensions

kable(PCA_sd_bg$var$coord[,1:3], digits = 3, format = "markdown")
Dim.1 Dim.2 Dim.3
age_15_29 0.570 -0.513 -0.374
age_30_74_pct -0.799 0.036 -0.032
age_75p 0.432 0.332 0.653
femmes 0.648 -0.031 0.647
pop_hommes_pct -0.648 0.031 -0.647
chom 0.824 0.233 -0.125
agric -0.129 0.420 0.018
indep -0.216 0.370 0.504
cadres -0.302 -0.751 0.386
interm -0.460 -0.720 0.119
ouvr 0.386 0.749 -0.384
dipl_aucun 0.670 0.493 -0.285
has_dip 0.364 -0.543 -0.042
resid_sec 0.119 0.592 0.370
proprio -0.955 0.066 0.004
locataire 0.956 -0.073 -0.023
hlm 0.830 -0.101 -0.248
maison -0.877 0.381 -0.001
appart 0.870 -0.389 -0.007

Component Summaries

First Principal Component Analysis - Dim.1

We can see that variables that are better represented are age_30_74_pct, chom, proprio, locataire, hlm, maison, appart

Second Principal Component Analysis - Dim.2

We can see here that the best represented variables are cadre, interm and manual workers (ouvr).

Third Principal Component Analysis - Dim.3

Women, Men and people aged 75 + are well represented.

Contribution

kable(PCA_sd_bg$var$contrib[,1:3], digits = 3, format = "markdown")
Dim.1 Dim.2 Dim.3
age_15_29 4.152 7.322 6.184
age_30_74_pct 8.158 0.037 0.046
age_75p 2.384 3.066 18.832
femmes 5.375 0.026 18.442
pop_hommes_pct 5.375 0.026 18.442
chom 8.688 1.509 0.689
agric 0.213 4.906 0.015
indep 0.595 3.817 11.197
cadres 1.168 15.690 6.586
interm 2.705 14.435 0.624
ouvr 1.904 15.637 6.512
dipl_aucun 5.734 6.762 3.582
has_dip 1.690 8.196 0.078
resid_sec 0.180 9.745 6.033
proprio 11.665 0.121 0.001
locataire 11.690 0.148 0.023
hlm 8.802 0.286 2.711
maison 9.843 4.050 0.000
appart 9.679 4.221 0.002
fviz_pca_contrib(PCA_sd_bg, choice = "var", axes = 1)
## Warning in fviz_pca_contrib(PCA_sd_bg, choice = "var", axes = 1): The function
## fviz_pca_contrib() is deprecated. Please use the function fviz_contrib() which
## can handle outputs of PCA, CA and MCA functions.

Quality of representation

Dim.1 Dim.2 Dim.3
age_15_29 0.325 0.263 0.140
age_30_74_pct 0.638 0.001 0.001
age_75p 0.186 0.110 0.427
femmes 0.420 0.001 0.418
pop_hommes_pct 0.420 0.001 0.418
chom 0.679 0.054 0.016
agric 0.017 0.176 0.000
indep 0.047 0.137 0.254
cadres 0.091 0.564 0.149
interm 0.212 0.519 0.014
ouvr 0.149 0.562 0.148
dipl_aucun 0.448 0.243 0.081
has_dip 0.132 0.294 0.002
resid_sec 0.014 0.350 0.137
proprio 0.912 0.004 0.000
locataire 0.914 0.005 0.001
hlm 0.688 0.010 0.061
maison 0.770 0.145 0.000
appart 0.757 0.152 0.000

Dimension 1 and 2

  • We observe the following variables to interpret the first axe:

  • Employment Status:

    • Manual Workers (ouvr)
    • unemployed (chom)
  • Population age group:

    • 15-29 years old
  • Housing Status of residents:

    • Living in HLM
    • Living in a appartment
    • Renting (locataire)
  • The level of Education:

    • No Diploma (dipl_aucun)

On the opposite direction:

  • Housing Status of residents:
    • Living in a house
    • Owning a property.
  • Population age group:
    • 30-74 years old

The second axe:

  • Employment Status:
    • Interm
    • Managers (Cadres)
  • The level of Education:
    • No Diploma

Intepretation of the first axe:

  • We see people owning a house correlated with the age group 30-75. On the otherhand, it is inversely correlated of renting an appartment and belong to a younger age group 15-29.

  • People renting a hlm housing are correlated with being in unemployment and without at least a diploma.

  • People not having at least a diploma is correlated with being manual workers (ouvr).

  • We get the same observation when comparing both the regions at once.

Intepretation of the second axe:

  • Younger people aged 15-29 are correlated of having a diploma and renting an appartment.

  • People who are executives (cadres) are correlated of having temporary jobs and inversely correlated of not having a diploma.

We see a some communes that correlated with people of owning on the first axe and correlated with people in hlm housing and in unemployment. We also see a lot of communes having people doing manual jobs (ouvr) with no diploma. We can notice a very small amount of cadres like the commune 7 (Chevigny-Saint-Sauveur), 9 (Fontaine-lès-Dijon) and 20 (Saint-Apollinaire) having people with employment as executives and on interim jobs.

Dimension 1 and 3

Intepretation of the first axe:

  • We see a better representation of the people having executives of being males. We can see a concentration of communes like 4 (Brazey-en-Plaine), 36 (Garchizy) and 48 (
    Blanzy) with closest correlations. We can see a less concentration of people living in communes that have manuals jobs with no diploma. We can see a lot of people living in communes like 90 (Avallon) and 105 (Sens) that is closely correlated of renting an appartment than living in hlm housings.

Dimension 2 and 3

We see a more accurate representation cadres and manual workers inversely correlated.

Intepretation of the second axe:

  • We see there is a lot communes having a lot of manual workers than executives working in bourgogne. We can also deduce of a closer correlation of more older people aged 75 + living in more communes.

Clusters

#clusters
set.seed(123)
kmeans_clusters_bg <- kmeans(PCA_bourg$ind$coord[,1:3], centers = 3)

# Plot clusters with labels
p_bg <- fviz_cluster(kmeans_clusters_bg , data = PCA_bourg$ind$coord[,1:3], 
                  geom = "point", ellipse.type = "norm") +
  geom_text_repel(aes(label = rownames(PCA_bourg$ind$coord[,1:3])), 
                  fontface = "bold", size = 3)

p_bg
## Warning: ggrepel: 9 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Dendogram

# Filter bourg_dat to select only the communes in Bourgogne
bourg_dat_bg <- bourg_dat[bourg_dat$region == "Bourgogne", ]

# Print the selected communes
#bourg_dat_bg

Multifac_dat_bg <- PCA_bourg$ind$coord[,1:3]

# Perform hierarchical clustering
hc_bg <- hclust(dist(scale(Multifac_dat_bg)), method = "ward.D2")

# Cut the dendrogram into 3 clusters
cut_ward_bg <- cutree(hc_bg, k = 3)

# Set the commune names as labels for the dendrogram nodes
labels(hc_bg) <- bourg_dat_bg$commune

# Plot the dendrogram with cluster borders and commune names
plot(hc_bg, hang = -1, main = "Hierarchical Clustering of Communes (Bourgogne)")
rect.hclust(hc_bg, k = 3, border = 2:4)
abline(h = 3, col = "red")

## 
##  The comunnes of the first cluster 
##  Arc-sur-Tille Fontaine-lès-Dijon Marsannay-la-Côte Mirebeau-sur-Bèze Saint-Apollinaire Sennecey-lès-Dijon Talant Varois-et-Chaignot Coulanges-lès-Nevers Marzy Pougues-les-Eaux Saint-Éloi Varennes-Vauzelles Le Breuil Buxy Charnay-lès-Mâcon Châtenoy-le-Royal Crêches-sur-Saône Crissey Givry Montcenis Saint-Rémy Saint-Vallier Appoigny Chevannes Monéteau Saint-Clément Saint-Georges-sur-Baulche
## 
##  The comunnes of the second cluster 
##  Auxonne Beaune Chenôve Chevigny-Saint-Sauveur Dijon Genlis Gevrey-Chambertin Is-sur-Tille Longvic Montbard Nuits-Saint-Georges Plombières-lès-Dijon Quetigny Semur-en-Auxois Venarey-les-Laumes Fourchambault Imphy Nevers Autun Chagny Chalon-sur-Saône Champforgeuil Cluny Le Creusot Mâcon Montceau-les-Mines Montchanin Saint-Marcel Torcy Varennes-le-Grand Auxerre Avallon Brienon-sur-Armançon Cheny Joigny Migennes Paron Saint-Florentin Sens Tonnerre
## 
##  The comunnes of the third cluster 
##  Brazey-en-Plaine Châtillon-sur-Seine Saulieu Selongey Seurre La Charité-sur-Loire Château-Chinon (Ville) Clamecy Cosne-Cours-sur-Loire Decize Garchizy Guérigny La Machine Saint-Léger-des-Vignes Saint-Pierre-le-Moûtier Blanzy Bourbon-Lancy Branges La Chapelle-de-Guinchay Charolles Chauffailles Ciry-le-Noble Digoin Épinac Gergy Gueugnon Louhans Ouroux-sur-Saône Paray-le-Monial Saint-Germain-du-Plain Sanvignes-les-Mines Sennecey-le-Grand Sornay Tournus Chablis Champigny Pont-sur-Yonne Saint-Julien-du-Sault Toucy Villeneuve-la-Guyard Villeneuve-sur-Yonne

PCA of Franche-Comte Region

The communes of the Franche-Comté region of France were organized in a hierarchical system until 2012. At the top of this system were the départements, or administrative divisions, of Doubs, Jura, Haute-Saône, and Territoire de Belfort. These départements were then divided into arrondissements, which were further divided into cantons. At the lowest level were the communes, which were the smallest administrative divisions in the region.

The communes of Franche-Comté varied in size and population, ranging from small rural villages to larger urban centers. In 2012, there were a total of 1,178 communes in the region. But in this data set we study 86 commnunes.

PCA

kable(PCA_sd$eig[1:3,], digits = 3, format = "markdown")
eigenvalue percentage of variance cumulative percentage of variance
comp 1 7.367 38.771 38.771
comp 2 2.895 15.239 54.011
comp 3 2.657 13.986 67.997
fviz_eig(PCA_sd)

Quality of representation

Dim.1 Dim.2 Dim.3
age_15_29 0.281 0.078 0.375
age_30_74_pct 0.433 0.001 0.042
age_75p 0.041 0.347 0.290
femmes 0.065 0.778 0.074
pop_hommes_pct 0.065 0.778 0.074
chom 0.667 0.000 0.034
agric 0.069 0.038 0.022
indep 0.126 0.082 0.000
cadres 0.196 0.228 0.255
interm 0.349 0.167 0.186
ouvr 0.270 0.306 0.231
dipl_aucun 0.501 0.019 0.226
has_dip 0.174 0.058 0.373
resid_sec 0.000 0.016 0.018
proprio 0.881 0.008 0.028
locataire 0.885 0.008 0.028
hlm 0.652 0.004 0.002
maison 0.829 0.004 0.056
appart 0.824 0.004 0.055

The PCA reduced the dimension of the data set and we keep 3 axes which explains 67% of the variance of the data. We will carry our analysis with 3 axes.

Dimensions

kable(PCA_fc.sd$var$coord[,1:3], digits = 3, format = "markdown")
Dim.1 Dim.2 Dim.3
age_15_29 0.530 -0.280 0.613
age_30_74_pct -0.658 -0.030 -0.206
age_75p 0.203 0.589 -0.539
femmes 0.254 0.882 -0.272
pop_hommes_pct -0.254 -0.882 0.272
chom 0.817 0.012 -0.185
agric -0.262 -0.195 -0.148
indep -0.355 0.286 -0.005
cadres -0.442 0.477 0.505
interm -0.591 0.409 0.432
ouvr 0.520 -0.553 -0.480
dipl_aucun 0.708 -0.138 -0.475
has_dip 0.418 0.242 0.611
resid_sec -0.013 -0.127 0.134
proprio -0.939 -0.092 -0.169
locataire 0.941 0.092 0.168
hlm 0.808 0.062 -0.043
maison -0.910 -0.063 -0.236
appart 0.908 0.062 0.234

Component Summaries

First Principal Component Analysis - Dim.1

The most well-represented groups consist of individuals who reside in a house, apartment, or HLM, as well as those who are unemployed or renting. Regarding the age gropup the 15-29 and 30-74 are the well represented represented. In terms of employment, the categories of managers (cadres), intermediate workers (interm), and manual workers (ouvr) are prevalent. Furthermore, individuals with no diploma qualifications make up a significant portion of the represented population.

Second Principal Component Analysis - Dim.2

In the second axis, we observe that both genders (male and female) are well represented. The age group over 75 years old is also represented significantly, followed by manual workers, managers, and temporary employed individuals (interm).

Third Principal Component Analysis - Dim.3

People aged between 15-29 years and those over 75 years of age are well represented. People employed as executive (cadres), having temporary and manual jobs, no diploma and having at least one diploma are also well represented.

Contribution

kable(PCA_fc.sd$var$contrib[,1:3], digits = 3, format = "markdown")
Dim.1 Dim.2 Dim.3
age_15_29 3.841 2.674 15.846
age_30_74_pct 5.924 0.030 1.791
age_75p 0.563 11.876 12.253
femmes 0.886 26.590 3.118
pop_hommes_pct 0.886 26.590 3.118
chom 9.124 0.005 1.441
agric 0.942 1.302 0.929
indep 1.726 2.788 0.001
cadres 2.678 7.782 10.747
interm 4.773 5.710 7.873
ouvr 3.701 10.472 9.735
dipl_aucun 6.856 0.655 9.528
has_dip 2.386 1.999 15.737
resid_sec 0.002 0.555 0.758
proprio 12.062 0.289 1.202
locataire 12.106 0.288 1.198
hlm 8.926 0.132 0.078
maison 11.340 0.135 2.343
appart 11.280 0.131 2.305
fviz_pca_contrib(PCA_fc.sd, choice = "var", axes = 1)
## Warning in fviz_pca_contrib(PCA_fc.sd, choice = "var", axes = 1): The function
## fviz_pca_contrib() is deprecated. Please use the function fviz_contrib() which
## can handle outputs of PCA, CA and MCA functions.

According to the contribution plot, the individuals who are tenants (locataire), property owners (propio), living in a house or apartment, living in subsidized housing (HLM), unemployed, no diploma and the age group 30-74 have made the greatest contribution to the data.

Dimension 1 and 2

Intepretation of the first axe

  • We have almost the same interpretation as the region of Bourgogne but stronger correlation of renting an appartment and having no diploma and being unemployed.

  • We see a less number of men that are executives and have a more concentration of women having executive jobs in more communes.

  • The correlation of renting an appart and hlm housing are more closely correlated.

Intepretation of the second axe

  • We see a better correlation of more people aged 75+ and women living in different communes.

Dimension 1 and 3

People renting an appartment are the most represented in communes 24 (Montbéliard), 47 (Lons-le-Saunier), 51 (Morez), 74 (Vesoul). We see a good representation of people owning a home in 2(Avanne-Aveney), 22(Mathay), 25 (Montferrand-le-Château), 60 (Champagney) and 78 (Châtenois-les-Forges).

Dimension 2 and 3

We can see that there are more women aged 75+ from the communes 2(
Avanne-Aveney), 3 (Baume-les-Dames) and 25(Montferrand-le-Château) and women aged 75+ in 57 (Salins-les-Bains) and 63 (Gray).

Clusters

#clusters

set.seed(123)
kmeans_clusters_FC <- kmeans(PCA_fc.sd$ind$coord[,1:3], centers = 2)

# Plot clusters with labels
p_FC <- fviz_cluster(kmeans_clusters_FC, data = PCA_fc.sd$ind$coord[,1:3], 
                     geom = "point", ellipse.type = "norm") +
  geom_text_repel(aes(label = rownames(PCA_fc.sd$ind$coord[,1:3])), 
                  fontface = "bold", size = 3)

p_FC
## Warning: ggrepel: 11 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

The interpretation of the k-means method with 2 or 3 clusters is difficult because the ellipses all intersect each other, making it challenging to distinguish the clusters.

Dendogram

However with the k-means we do not have a good spliting in the clusterting in the dendogram we have a better representation.

# Filter FC_dat to select only the communes in Franche-Comté
FC_dat_fc <- FC_dat[FC_dat$region == "Franche-Comte", ]

Multifac_dat_fc <- PCA_fc.sd$ind$coord[,1:3]


# Extract the first three principal components from PCA_fc.sd
Multifac_dat_fc <- PCA_fc.sd$ind$coord[,1:3]

hc_fc <- hclust(dist(scale(Multifac_dat_fc)), method = "ward.D2")


cut_ward_fc <- cutree(hc_fc, k = 3)

# Set the commune names as labels for the dendrogram nodes
labels(hc_fc) <- FC_dat_fc$commune

# Plot the dendrogram with cluster borders and commune names
plot(hc_fc, hang = -1, main = "Hierarchical Clustering of Communes (Bourgogne)")
rect.hclust(hc_fc, k = 3, border = 2:4)
abline(h = 3, col = "red")

## 
##  The comunnes of the first cluster 
##  Audincourt Bethoncourt Charquemont Fesches-le-Châtel Les Fins Grand-Charmont Hérimoncourt L'Isle-sur-le-Doubs Villers-le-Lac Levier Maîche Mandeure Morteau Pontarlier Pont-de-Roide Le Russey Sochaux Valdahon Valentigney Moirans-en-Montagne Morez Les Rousses Saint-Claude Saint-Lupicin Fougerolles Héricourt Noidans-lès-Vesoul Port-sur-Saône Ronchamp Saint-Loup-sur-Semouse Saint-Sauveur Beaucourt Delle Giromagny Grandvillars Offemont
## 
##  The comunnes of the second cluster 
##  Avanne-Aveney Baume-les-Dames Bavans Doubs École-Valentin Étupes Exincourt Franois Mathay Miserey-Salines Montferrand-le-Château Ornans Pirey Roche-lez-Beaupré Saint-Vit Saône Seloncourt Thise Vieux-Charmont Voujeaucourt Arbois Damparis Foucherans Montmorot Morbier Saint-Amour Tavaux Arc-lès-Gray Champagney Échenoz-la-Méline Rioz Vaivre-et-Montoille Châtenois-les-Forges Essert Évette-Salbert
## 
##  The comunnes of the third cluster 
##  Besançon Montbéliard Champagnole Dole Lons-le-Saunier Poligny Salins-les-Bains Gray Lure Luxeuil-les-Bains Vesoul Bavilliers Belfort Danjoutin Valdoie

Study some differences between the Clusters

# Calculate means of variables for each cluster
means_by_cluster <- aggregate(FC_dat_fc[, c("femmes", "pop_hommes_pct", "chom", "agric", "indep", "cadres")], 
                              by = list(cluster = cut_ward_fc), 
                              FUN = mean)

means_by_cluster
##   cluster   femmes pop_hommes_pct      chom     agric    indep    cadres
## 1       1 50.32449       49.67551 14.991999 0.5656880 4.276999  7.331141
## 2       2 51.50741       48.49259  9.756432 0.4434357 5.707986 12.902751
## 3       3 53.24588       46.75412 17.526271 0.3746760 5.037789 11.502063
# Calculate medians of variables for each cluster
medians_by_cluster <- aggregate(FC_dat_fc[, c("femmes", "pop_hommes_pct", "chom", "agric", "indep", "cadres")], 
                                by = list(cluster = cut_ward_fc), 
                                FUN = median)

medians_by_cluster 
##   cluster   femmes pop_hommes_pct      chom      agric    indep    cadres
## 1       1 50.80316       49.19684 15.497079 0.06922811 4.288935  6.961646
## 2       2 51.36276       48.63724  9.272965 0.36697248 5.746626 11.799410
## 3       3 53.26149       46.73851 17.072148 0.16778523 4.447614 10.918801

age_15_29, age_30_74_pct, age_75p, femmes, pop_hommes_pct, chom, agric, indep, cadres, interm, ouvr, dipl_aucun ,has_dip, resid_sec, proprio, locataire, hlm , maison, appart

# Create box plots of variables by cluster
par(mfrow = c(1,3))
boxplot(femmes ~ cluster, data = means_by_cluster, 
        main = "Population Density by Cluster (Mean)")
boxplot(pop_hommes_pct ~ cluster, data = means_by_cluster, 
        main = "Median Income by Cluster (Mean)")
boxplot(agric ~ cluster, data = means_by_cluster, 
        main = "Internet Access by Cluster (Mean)")

Study of the Clusters

By comparing age, gender, occupation, education, and type of housing, we can analyze the cluster means to obtain a more comprehensive understanding of the three clusters obtained for the Franche Compte region.

cluster femmes pop_hommes_pct chom agric indep cadres age_15_29 age_30_74_pct age_75p interm ouvr dipl_aucun has_dip resid_sec proprio locataire hlm maison appart
1 50.32449 49.67551 14.991999 0.5656880 4.276999 7.331141 18.23375 54.03734 9.438438 19.78441 40.78336 22.94525 2366.778 3.277629 56.29185 41.75355 18.443625 53.48484 45.87418
2 51.50741 48.49259 9.756432 0.4434357 5.707986 12.902752 15.52355 56.25988 10.382173 26.36782 26.07571 15.29675 1627.714 1.985429 69.21894 29.16096 9.151561 73.38042 26.16768
3 53.24588 46.75412 17.526271 0.3746760 5.037789 11.502064 19.84538 51.47729 12.329923 22.47357 28.59805 21.96802 9875.467 2.332520 42.94616 54.70610 24.068914 34.79207 64.03049
cluster femmes pop_hommes_pct chom agric indep cadres age_15_29 age_30_74_pct age_75p interm ouvr dipl_aucun has_dip resid_sec proprio locataire hlm maison appart
1 50.80316 49.19684 15.497079 0.0692281 4.288935 6.961646 17.34233 54.67033 9.148917 19.71087 41.56923 21.92019 1820.5 1.6352560 56.37730 42.20348 16.752136 56.13110 43.51384
2 51.36276 48.63724 9.272965 0.3669725 5.746626 11.799410 15.33164 55.64313 9.642744 26.18619 26.66667 14.77573 1463.0 0.8849558 68.75000 30.14706 8.998647 72.65569 26.73943
3 53.26149 46.73851 17.072148 0.1677852 4.447614 10.918801 19.14943 51.57610 11.644501 22.10014 27.97943 21.27419 3706.0 1.5268079 44.75975 52.36295 22.730414 38.69637 60.55597

By Gender

We can observe that there is a significant variation in the female population across the clusters, with the lowest number of women in the first cluster and the highest in the third cluster. In contrast, the male populations show the opposite pattern, with the highest number of men in the first cluster and the lowest in the third cluster.

By Age

  • It is noticeable that the first cluster is dominated by younger individuals (aged 15-29), while the age range of 30-74 and >75 follows in increasing order.

  • In contrast, the second cluster has the lowest number of young people and the highest number of individuals aged between 30-74.

  • The third cluster, on the other hand, has the highest concentration of young individuals (aged 15-29) as well as old individuals (>75).

By Education

  • Comparing the clusters, it is evident that the first cluster has a higher percentage of individuals with superior diplomas, whereas the second cluster has the lowest mean in both categories. In the third cluster, we can find the largest population of individuals without a superior diploma.

By Occupation

  • We can observe that the first cluster is predominantly composed of individuals working in agriculture and manual labor, while having the lowest proportion of managers and manual workers. This observation is consistent with the results of the PCA, which showed an inverse correlation between these two types of occupations.

  • Among the clusters, the second one stands out with the largest number of managers, while agriculture and manual workers make up the next two largest groups, respectively.

  • The third cluster has a notable population of managers, ranking just behind the second cluster, and is followed by manual workers and agriculture in that order.

By housing

  • In Cluster 1 we see the same proportion in each category.

  • The PCA analysis supports the finding that cluster 2 has a greater proportion of homeowners and a smaller proportion of renters compared to the other clusters.

  • The majority of people in cluster 3 live in apartments, while the number of homeowners and renters is comparatively low.

Analysis

  • Cluster 1:

The first cluster is primarily made up of younger individuals, with ages ranging from 15 to 29 years old. The proportion of individuals in the age ranges of 30-74 and over 75 years old increases in ascending order. Additionally, the first cluster stands out with a higher number of individuals holding higher degrees compared to the other clusters. However, in terms of occupation, the first cluster is largely comprised of individuals working in agriculture and manual labor, indicating a potential disparity between education and employment opportunities in this group.

  • Cluster 2:

The analysis reveals that the second cluster is distinct from the others in several ways. Firstly, it has a lower number of young people and a higher number of individuals aged between 30-74, indicating a potential difference in the distribution of age groups across the clusters. Additionally, the second cluster has a greater proportion of managers compared to the other clusters, followed by agriculture and manual workers. This suggests that the occupational composition of this group is markedly different from that of the other clusters. Moreover, the second cluster is characterized by a higher proportion of homeowners and a lower proportion of renters compared to the other clusters, highlighting potential differences in the housing arrangements of these groups

  • Cluster 3:

The third cluster displays distinct demographic and occupational characteristics. It is distinguished by having the highest concentration of young and old individuals and a notable population of managers, with manual workers and agriculture following in that order. Additionally, the third cluster has the largest proportion of individuals without a higher degree. Housing-wise, the majority of people in this cluster live in apartments, with a comparatively lower number of homeowners and renters

Conclusion of Analysis

Overall, we saw that the employment status is correlated in both region and the housing type residents rents or own and this is different per communes for both regions. We saw more women having executives jobs in Franche comté than men. Additionally, executives in both region are correlated to interim jobs in both regions.

Recommendations for further analysis

A more in depth analysis of the communes of both region to investigate which type of communes exibit similar economics conditions. Eventually, more data will be needed. For example: Salary, Tax payments, financial background and field of work could be influencing factors.